3 research outputs found
A Pattern-mining Driven Study on Differences of Newspapers in Expressing Temporal Information
This paper studies the differences between different types of newspapers in
expressing temporal information, which is a topic that has not received much
attention. Techniques from the fields of temporal processing and pattern mining
are employed to investigate this topic. First, a corpus annotated with temporal
information is created by the author. Then, sequences of temporal information
tags mixed with part-of-speech tags are extracted from the corpus. The TKS
algorithm is used to mine skip-gram patterns from the sequences. With these
patterns, the signatures of the four newspapers are obtained. In order to make
the signatures uniquely characterize the newspapers, we revise the signatures
by removing reference patterns. Through examining the number of patterns in the
signatures and revised signatures, the proportion of patterns containing
temporal information tags and the specific patterns containing temporal
information tags, it is found that newspapers differ in ways of expressing
temporal information.Comment: 19 page
Issues in Designing a Corpus of Spoken Irish
This paper describes the stages involved in implementing a corpus of spoken Irish. This pilot project (consisting of approximately 140K of transcribed data) implements parts of the design of a larger corpus of spoken Irish which it is hoped will contain approximately 2 million words when complete. It hoped that such a corpus will provide material for linguistic research, lexicography, the teaching of Irish and for development of language technology for the Irish language